Skip to content

chore: resync API dumps to current develop source#647

Merged
michalharakal merged 7 commits into
developfrom
chore/resync-api-dumps
May 30, 2026
Merged

chore: resync API dumps to current develop source#647
michalharakal merged 7 commits into
developfrom
chore/resync-api-dumps

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

What

Regenerates the binary-compatibility-validator .api baselines via ./gradlew apiDump with no source changes. The committed dumps had drifted from current source, so apiCheck was effectively failing/unenforced.

Evidence of drift (examples):

  • Q8_0BlockTensorData implements PackedBlockStorage in source, but the committed dump omits it.
  • Several ExecutionContext accessors (memoryPlanner, memoryTracker, scratch, wrapByteArray/FloatArray/IntArray, placeholder) were never re-dumped.

Why

This unblocks apiCheck repo-wide and lets the upcoming Q4_0 feature PRs show only their own ~20-line API deltas instead of mixing in ~1700 lines of unrelated baseline churn.

Scope

Pure mechanical resync — 1737 additions / 49 deletions across 6 modules (skainet-lang-core, skainet-backend-cpu, skainet-compile-{dag,hlo,opt}, skainet-lang-dag). No .kt changes. No new public API.

🤖 Generated with Claude Code

michalharakal and others added 2 commits May 30, 2026 19:38
The committed binary-compatibility-validator baselines had drifted from
the current source — e.g. `Q8_0BlockTensorData` implements
`PackedBlockStorage` in code but the dump didn't reflect it, and several
`ExecutionContext` accessors (`memoryPlanner`, `scratch`, `wrapByteArray`,
…) were never re-dumped. Regenerated via `./gradlew apiDump` with no
source changes, so `apiCheck` is green again repo-wide. No public API
changes here — purely a baseline resync (1737 additions across 6
modules) so subsequent feature PRs show only their own deltas.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Promotes Q4_0 (older GGML 4-bit, 18 bytes / 32 elements) from a
JVM/MemSegment-only side-path to a first-class quantized format that
any loader can produce and any backend can specialize, mirroring Q8_0:

- commonMain `Q4_0TensorData` interface + `Q4_0BlockTensorData` (heap,
  ByteArray-backed) with `toFloatArray()` dequant and PackedBlockStorage.
- `TensorEncoding.Q4_0` (32 elems / 18 bytes).
- `Q4_0MatmulKernel` SPI + `KernelProvider.matmulQ4_0()` (default null)
  and a `"Q4_0"` case in `supports()`.
- `ScalarQ4_0MatmulKernel` (portable commonMain floor) wired through
  `ScalarKernelProvider`.
- `DefaultCpuOpsJvm`: lazy `q4_0MatmulKernel` resolved via KernelRegistry
  + an `is Q4_0TensorData ->` branch in `chooseQuantizedMatmul`.

Uses the canonical ggml *split* nibble layout (low nibbles → elements
0..15, high → 16..31, `(code - 8) * d`) matching
`DequantOps.dequantQ4_0FromBytes` — NOT the interleaved layout the
existing JVM MemSeg `dotQ4_0BlockMemSeg` uses (that mismatch is the
likely reason the Q4_0 MemSeg path was never exercised; PR2 reconciles
it).

Tests: Q4_0TensorDataTest (layout/dequant), Q4_0MatmulDispatchTest
(scalar==dispatch), KernelProviderSupportsTest extended for Q4_0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-647 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

michalharakal and others added 5 commits May 30, 2026 19:47
Adds `PanamaVectorQ4_0MatmulKernel` (JDK Vector API): per block, decode
the FP16 scale, unpack the 16 code bytes into 32 sign-corrected floats
in the canonical ggml split layout, then SIMD-FMA against the input
window. Wired through `PanamaVectorKernelProvider.matmulQ4_0()` (priority
50), so `DefaultCpuOpsJvm`'s `q4_0MatmulKernel` now prefers it over the
scalar floor on JDK 21+.

Also fixes a latent layout bug: the existing JVM MemSegment Q4_0 path
(`JvmQuantizedVectorKernels.dotQ4_0BlockMemSeg` and
`Q4MemorySegmentTensorData` get/set/copyToFloatArray) used an
*interleaved* nibble layout (code[2k]/[2k+1] from byte k), which does
NOT match real GGUF Q4_0 weights (split layout: low nibbles → 0..15,
high → 16..31, per `DequantOps.dequantQ4_0FromBytes`). This mismatch is
the likely reason the Q4_0 MemSeg path was never exercised end-to-end.
All three sites + the test encoder are reconciled to the split layout,
so the MemSeg path now agrees with the heap `Q4_0BlockTensorData`, the
scalar/Panama SPI kernels, and canonical ggml.

Tests: PanamaVectorQ4_0MatmulKernelParityTest (scalar≈panama within FMA
tolerance), QuantizedMemSegMatmulTest still green under split layout.
apiCheck green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the Q4_0 kernel stack with a hand-written C kernel at priority
100. Adds native/src/q4_0_matmul.c (split-layout `(code - 8) * d` decode,
tight auto-vectorizing inner loop mirroring q8_0_matmul.c), declares
skainet_q4_0_matmul in skainet_kernels.h, and adds it to CMakeLists.

Kotlin side: NativeQ4_0MatmulKernel (FFM downcall, mirrors
NativeQ8_0MatmulKernel) wired through NativeKernelProvider.matmulQ4_0().
With the bundled libskainet_kernels loaded, KernelRegistry.bestAvailable()
now prefers native → Panama → scalar for Q4_0, same cascade as Q8_0/Q4_K.

Verified locally (cmake build): NativeQ4_0MatmulKernelParityTest passes —
native output matches PanamaVectorQ4_0MatmulKernel within FMA tolerance
across matvec / attention / FFN shapes. CI without the native lib stays
green via the same availability gate the other native parity tests use.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat(q4_0): native FFM kernel (skainet_q4_0_matmul)
feat(q4_0): first-class Q4_0 core format + scalar kernel + SPI
feat(q4_0): Panama SIMD kernel + reconcile MemSeg to split layout
@michalharakal michalharakal merged commit 2b84824 into develop May 30, 2026
12 checks passed
@michalharakal michalharakal deleted the chore/resync-api-dumps branch May 30, 2026 17:54
@github-actions

Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-647 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

1 similar comment
@github-actions

Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-647 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant